REPORT  DOCUMENTATION  PAGE 


Form  Approved  OMB  NO.  0704-0188 


The  public  reporting  burden  for  this  coilection  of  information  is  estimated  to  average  1  hour  per  response,  inciuding  the  time  for  reviewing  instructions, 
searching  existing  data  sources,  gathering  and  maintaining  the  data  needed,  and  compieting  and  reviewing  the  coiiection  of  information.  Send  comments 
regarding  this  burden  estimate  or  any  other  aspect  of  this  coilection  of  information,  including  suggesstions  for  reducing  this  burden,  to  Washington 
Headquarters  Services,  Directorate  for  information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Ariington  VA,  22202-4302. 
Respondents  shouid  be  aware  that  notwithstanding  any  other  provision  of  iaw,  no  person  shaii  be  subject  to  any  oenaity  for  failing  to  comply  with  a  coiiection 
of  information  if  it  does  not  dispiay  a  currentiy  vaiid  OMB  controi  number. 

PLEASE  DO  NOT  RETURN  YOUR  FORM  TO  THE  ABOVE  ADDRESS. 


5c.  PROGRAM  ELEMENT  NUMBER 
6310AH 


5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 


12.  DISTRIBUTION  AVAILIBILITY  STATEMENT 
Approved  for  Public  Release;  Distribution  Unlimited 

13.  SUPPLEMENTARY  NOTES 
The  views,  opinions  and/or  findings  contained  in  this  report  are  those  of  the  author(s)  and  should  not  contrued  as  an  official  Department 
of  the  Army  position,  policy  or  decision,  unless  so  designated  by  other  documentation. 

14.  ABSTRACT 

To  classify  a  disease  samples  using  high  throughput  genomic  and  proteomic  data,  it  is  essential  to  decide  whieh  toll 
like  receptors  and  CD  marker  should  be  ineluded  in  a  predictor  list.  Too  few  markers  may  not  be  enough  to 
diseriminate  and  classify  an  exposure.  Having  too  many  Markers  is  not  optimal  either,  as  some  of  these  markers 
may  be  irrelevant  to  the  diagnosis  and  may  reduce  the  information  deeisive  factor  due  to  adding  noise.  Efforts  are 
made  to  seleet  an  optimal  set  of  targets  for  whieh  to  start  the  training  of  a  set  of  predictors.  This  is  accomplished  by 

„  - -  „„  A, - - „i  1  r>nr>v _ _ i  - 1,,„;„  tvy _ „i 

15.  SUBJECT  TERMS 

classification  algorithms,  pathogens,  CD  markers 

17.  LIMITATION  OF  1 15.  NUMBER 
ABSTRACT  OF  PAGES 

UU 

Standard  Form  298  (Rev  8/98) 
Prescribed  by  ANSI  Std.  Z39. 18 


19a.  NAME  OF  RESPONSIBLE  PERSON 

Rasha  Hammamieh _ 

19b.  TELEPHONE  NUMBER 
301-619-2338 


16.  SECURITY  CLASSIFICATION  OF: 

a.  REPORT 

b.  ABSTRACT 

c.  THIS  PAGE 

UU 

UU 

UU 

7.  PERFORMING  ORGANIZATION  NAMES  AND  ADDRESSES 

Georgetown  University 
37th  and  O  Streets,  NW 

Washington,  DC _ 20057  -1789 _ 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS 
(ES) 

U.S.  Army  Research  Office 
P.O.Box  12211 

Research  Triangle  Park,  NC  27709-2211 


8.  PERFORMING  ORGANIZATION  REPORT 
NUMBER 


10.  SPONSOR/MONITOR'S  ACRONYM(S) 
ARO 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 

53273-LS.4 


5d.  PROJECT  NUMBER 


3.  DATES  COVERED  (From  -  To) 

l-Aug-2007-  31-Oct-2013 

5a.  CONTRACT  NUMBER 

W911NF-07- 1-0479 _ 

5b.  GRANT  NUMBER 


2.  REPORT  TYPE 

Final  Report 

4.  TITLE  AND  SUBTITLE 

Final  Report:  Applying  signature  extraction  and  classification 
algorithms  on  express  on  profiles  of  CD  markers  and  toll  like 
receptors  to  classify  and  predict  exposures  to  various  pathogens 

6.  AUTHORS 
Seid  Muhie 


1.  REPORT  DATE  (DD-MM-YYYY) 

10-02-2016 


Report  Title 

Final  Report:  Applying  signature  extraetion  and  elassifieation  algorithms  on  express  on  profiles  of  CD  markers  and 
toll  like  reeeptors  to  elassify  and  prediet  exposures  to  various  pathogens 

ABSTRACT 

To  classify  a  disease  samples  using  high  throughput  genomic  and  proteomic  data,  it  is  essential  to  decide  which  toll  like  receptors  and  CD 
marker  should  be  included  in  a  predictor  list.  Too  few  markers  may  not  be  enough  to  discriminate  and  classify  an  exposure.  Having  too 
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Various  algorithms  and  tools  are  developed  and  described  in  the  literature.  These  algorithms  will  serve  as  a  foundation  for  the  development 
of  the  statistical  classification  tool. 

We  will  examine  these  algorithms  for  best  and  optimal  prediction  model  and  feature  extraction. 

Initially,  data  generated  using  cDNA  microarrays  will  be  processed,  filtered  and  analyzed  using  in  house  data  analysis  tools.  Expression 
profiles  for  the  toll  like  receptors  and  CD  markers  for  each  pathogen  at  various  time  points  will  be  extracted.  These  profiles  will  be  used  to 
identify  the  markers  that  are  good  discriminators  for  certain  pathogen  at  certain  time  point.  In  the  process  of  analyzing  the  data,  we  consider 
two  assumptions:  1)  The  distribution  of  the  gene  intensities  in  a  sample  is  normal  and  2)  A  gene  is  a  good  discriminator  if  it  is  present  at  a 
consistently  high  level  in  one  class  and  absent  or  present  at  a  consistently  low  level  in  the  other  class. 

To  validate  each  list  of  predictors,  we  will  use  our  database  of  gene  expression  as  a  training  set  and  add  some  blinded  samples  to  see  whether 
these  predictors  are  able  to  identify  an  exposure  by  analyzing  the  expression  profiles  of  toll  like  receptors  and  CD  markers  correlated  with 
this  exposure. 
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Applying  signature  extraction  and  classification  algorithms  on  express  on  profiles  of  CD  markers  and  toll 
like  receptors  to  classify  and  predict  exposures  to  various  pathogens 


•  Statement  of  the  problem  studied 

Pathogen  detection  and  identification  tools  developed  are  not  always  effective  especially  in  early  stages 
post  exposure.  Host  response  to  biological  threat  agents  has  been  a  very  important  issue  in  the  case  of 
an  outbreak.  Identification  of  signature  markers  for  exposures  to  various  biological  threat  agents 
provides  a  vital  tool  for  classification  of  outbreaks. 

This  project  attempts  to  address  this  need  by  exploring  the  feasibility  of  employing  computational 
methods  to  determine  predictors  and  classifiers  of  various  pathogens. 

We  have  obtained  a  large  body  of  experimental  data  characterizing  effects  over  time  for  exposure  to 
various  biological  threat  agents.  We  are  establishing  a  database  of  gene  expression  profiles  at  multiple 
time  points  for  thousands  of  genes.  In  order  to  diagnose  and  treat  not  only  known  biothreats,  but  also 
newly  engineered  ones,  it  is  important  to  identify  biomolecular  unique  signatures  underlying  the 
observed  host  response  to  a  pathogen.  Computational  approaches  are  essential  to  organize  and 
visualize  the  variety  of  data  and  to  facilitate  feature  extraction  and  prediction  of  an  exposure. 


•  Summary  of  the  most  important  results 

We  developed  an  algorithm  to  apply  predictive  modeling  and  feature  extraction  using  our  continuously 
growing  microarray  gene  expression  database  obtained  by  exposing  PBMCs  to  various  classes  of 
pathogens  (virus,  toxin,  gram  negative  and  gram  positive  bacteria)  at  various  time  points.  We  carried 
out  carrying  out  gene  expression  analysis  for  SEB,  Dengue,  Plague,  VEE,  Bot  toxin,  at  various  time  points 
in  more  than  three  replicates  each. 

Host  gene  expression  in  vitro:  Microarray  analysis  was  carried  out  at  3-6  time  periods  post  exposure  of 
PBMC  to  each  pathogen  or  vehicle.  Prior  studies  [11]  showed  specific  gene  sets  related  to  sex,  age  and 
other  parameters,  therefore  it  was  important  to  first  identify  genes  that  are  normally  variant  among 
healthy  humans.  Data  from  only  the  control  samples  of  these  healthy  donors  were  subjected  to  ANOVA 
(p=<0.05)  and  6%  of  the  genes  varied  widely  among  the  individuals  who  were  healthy  human  donors. 
These  genes  that  showed  inconsistent  expression  profiles  were  excluded  from  further  comparisons 
among  the  data  sets  from  both  control  and  exposed  samples.  This  provided  a  baseline  to  confidently 
identify  transcriptional  responses  induced  by  bacteria  (anthrax,  plague.  Brucella),  toxins  (CT,  SEB, 
BoNTA),  or  viruses  (Dengue,  VEE). 


Consistency  of  responses:  We  used  PBMC  from  at  least  3  different  donors,  exposing  cells  to  pathogen  or 
vehicle  for  specified  periods  of  time. 


Unique  gene  patterns  induced  by  BTAs;  The  gene  responses  were  dissected  to  identify  sets  of 
genes  that  will  differentiate  one  agent  from  another  based  on  the  patterns  of  host  gene  induction. 
The  GeneSpring  (Silicon  Genetics,  California)  clustering  diagram  illustrates  gene  expression 
patterns  that  can  discriminate  among  the  various  pathogenic  agents  by  identification  of  sets  of 
genes  where  up  regulation  and  down  regulation  is  seen  for  specific  pathogens.  The  combination 
of  these  selected  genes  can  be  the  foundation  for  designing  specific  diagnostic  assays  for 
exposure  to  one  or  more  agents.  Additionally,  gene  patterns  for  the  earliest  exposure  for  SEB  or 
CT  clustered  less  closely  with  the  later  exposure  times,  but  when  observed  relative  to  all 
pathogens,  the  four  exposure  time  periods  for  SEB  were  relatively  closely  clustered.  A  striking 
observation  is  that  for  all  pathogens  except  SEB,  the  longest  exposure  times  differ  markedly 
from  the  clusters  of  the  early  time  periods.  Eor  5.  anthracis,  Y.  pestis,  B.  melitensis,  and  CT, 
those  late  exposure  times  cluster  together  for  these  various  pathogens.  This  loss  of  pathogen- 
specific  responses  in  vitro  after  lengthy  exposure  was  not  seen  for  the  in  vivo  studies. 

Use  of  training  and  test  data  sets  for  classifying  test  exposures:  To  determine  whether  the  microarray 
data  obtained  in  this  study  can  be  used  to  predict  the  exposure  type  of  an  uncharacterized  sample  or 
condition,  we  applied  a  supervised  learning  method  for  class  prediction  (GeneSpring)  that  uses  the  k- 
nearest  neighbor  algorithm.  When  algorithm  was  applied  on  the  data  set  (training  set)  to  predict  the 
exposure  type  of  a  data  set  obtained  from  an  exposure  to  Y.  pestis  (test  set),  we  were  able  to  correctly 
predict  the  type  of  exposure  with  a  p<0.02.  We  previously  reported  that  a  set  of  predictor  genes  was 
identified  when  samples  from  exposures  of  piglets  to  SEB  were  used  as  test  sets  [12,  13]. 

Functional  classification  of  genes  differentially  regulated: 

Gene  ontological  analysis  was  carried  out  for  the  genes  that  were  differentially  expressed. 
Comparison  of  gene  responses,  based  on  functional  similarities,  not  surprisingly,  showed  many  up 
regulated  genes  coding  for  inflammatory  mediators.  We  clustered  and  sorted  the  differentially 
expressed  genes  by  their  functional  classification.  For  gene  group  (/)  "Growth  Factor,  Cytokines  & 
Chemokines,"  anthrax.  Brucella  and  SEB  showed  major  up  regulation  of  most  genes  coding  for 
inflammatory  mediators;  the  other  5  agents  had  mixed  or  modest  effects.  Similarly,  categories  (Hi) 
"Interleukins  and  Interferon  Receptors"  and  (;V)  "Interleukins"  showed  up  regulation  by  most 
pathogens,  notable  exceptions  being  the  viruses.  Down  regulated  genes,  though  seen  extensively 
throughout  the  study,  displayed  functional  clustering  for  each  pathogenic  agent  such  as  (;7) 

"Flomeostasis  &  detoxification,"  (v)  "Ligand-gated  ion  channels,"  and  so  forth.  Plague  induced  high  levels 


of  interleukin-6,  macrophage  inflammatory  protein-1  beta,  tumor  necrosis  factor-alpha  (TNF-a),  and 
granulocyte  macrophage  colony  stimulating  factor  (GM-CSF)  when  compared  with  Brucella  and  anthrax. 
Not  surprisingly,  the  superantigen  SEB  displayed  kinetic  patterns  for  over  expression  of  interferon-y,  IL- 
2,  IL-6,  MlP-la,  and  GM-CSF.  There  are  major  differences  in  expression  of  death  receptors,  homeostasis, 
and  caspases,  examples  of  which  include  defensins  and  certain  oxidases  (homeostasis)  that  are  down 
regulated  by  plague  and  SEB.  A  large  number  of  transcription  factors  are  down  regulated  by  anthrax. 
Brucella,  and  SEB,  but  plague  consistently  down  regulated  the  widest  range  of  these  genes. 

Gene  responses  induced  by  BTAs  in  vivo;  comparison  with  in  vitro  changes:  To  determine  gene 
changes  induced  by  BTAs  in  an  animal  model,  NFIP  were  exposed  to  6.  anthracis  spores  by  aerosol 
challenge.  This  model  has  been  characterized  previously  to  mimic  inhalation  anthrax  in  humans.  Blood 
samples  were  collected  24  h,  48  h,  and  72  h  post  exposure  (by  72  h  the  NFIP  were  beginning  to  show 
signs  of  the  illness,  which  progresses  very  rapidly  to  lethality).  The  gene  expression  profiles  for  in  vitro 
exposure  of  PBMC  to  anthrax  spores  were  compared  with  those  found  in  isolated  PBMC  at  various  time 
periods  from  NFIP.  Even  by  24  h,  a  robust  response  was  observed,  showing  up  regulation  of  genes 
coding  for  proteases;  proteosome  components  c2,  c3,  c5;  various  cytokines;  pro-apoptotic  genes;  cyclic 
adenosine  monophosphate  (cAMP)-related  kinases,  cAMP  regulated  transcription  factors;  and  hypoxia 
inducible  factor-1  (FIIF-1).  Down  regulated  genes  included  tyrosine  kinases,  cytokine  receptors,  growth 
factors,  and  adenosine  diphosphate  (ADP)  ribosylation  factors.  Comparison  of  the  in  vivo  results  with 
the  in  vitro  changes  induced  by  anthrax,  showed  remarkable  similarities  in  gene  patterns.  Clearly  many 
more  changes  were  observed  in  vivo  than  in  vitro.  Certain  surface  antigens  showed  significant  alteration 
that  was  unique  to  anthrax  exposure.  Diagrams  were  constructed  to  identify  sets  of  genes  that  were  up 
regulated  at  either  24  or  72  h;  other  gene  sets  showed  up  regulation  at  both  time  periods. 

A  few  genes  were  selected  that  showed  changes  induced  by  6.  anthracis  exposure  were 
confirmed  by  RT-PCR,  and  the  level  of  expression  was  compared  both  in  vitro  and  in  vivo  after  anthrax 
exposure.  Altered  regulation  of  that  G-protein  was  not  seen  with  the  other  pathogenic  agents.  In  an 
experiment  of  SEB  exposure  to  NFIP,  IL-6  and  guanylate  binding  protein  GBP-2  were  up  regulated  (6-  and 
65-fold,  respectively)  by  30  min  post-exposure  and  the  increased  expression  persisted  through  24  h  . 
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